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Description 

Speech recognition system, training arrangement and method of calculating iteration values 
for free parameters of a maximiam-entropy speech model 



The invention relates to a method of calculating iteration values for free 

'iortho(n) 

parameters " ofa maximum-entropy speech model MESM in a speech recognition 
system with the aid of the generalized iterative scaling training algorithm in accordance with 
the following formula: 



(1) 



n : is an iteration parameter; 

10 G : is a mathematical function; 

a : is an attribute in the MESM; and 

: is a desired orthogonalized boimdary value in the MESM for the attribute a. 
The invention ftirther relates to a computer-supported speech recognition 
system known in the state of the art, as well as a known computer-supported training 
1 5 arrangement in which the method described is implemented. 

A starting point for the formation of a speech model as it is related in a 
computer-supported speech recognition system for recognizing entered speech is a predefined 
training object. The training object maps certain statistical patterns in the language of a future 
user of the speech recognition system into a system of mathematically formulated boimdary 
20 conditions, which system generally has the following form: 

" (2) 

where: 

25 N(h) : refers to the frequency of history h in a training corpus; 
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^ (w|h) : refers to probabUily p(w 1 h) with which a predefined word w follows 

a previous word sequence h (history); 

(h,w) : refers to a binary attribute function for an attribute a; and a 

Ma : refers to a desired boundary value in the system of boundary 

5 conditions. 



The solution of this system of boundary conditions i.e. the training object is 
formed by the so-termed maximum-entropy speech model MESM which indicates a suitable 
solution of the system of boundary conditions in the form of a suitable definition of the 
probability p(w | h), which reads as follows: 

10 

where: 

ZX (h) : refers to a history-dependent standardization factor; 

\S Xa. : refers to a free parameter for the attribute a; 

X : refers to the set of all parameters. For the above parameters hold their 

above definitions. 

The binary attribute function fa(h,w) makes, for example, a binary decision 
whether predefined word sequences h,w contain predefined words at certain locations. An 

20 attribute a may generally refer to a single word, a word sequence, a word class (color or 
verbs), a sequence of word classes or more complex patterns. 

Fig. 4 shows predefined attributes in a speech model by way of example. For 
example, the imigrams shown each represent a single word, the bigrams each represent a 
word sequence consisting of two words and the trigram shown represents a word sequence 

25 consisting of three words. The bigram "ORA" includes the unigram "A" and, in addition, 
includes a fiirther word; therefore it is referred to as havmg a larger range compared to the 
unigram "A". Analogously, the trigram "A WHITE HOUSE" has a larger range than the 
unigram "HOUSE" or than the bigram "WHITE HOUSE". 
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The free parameters K are adapted so that equation 3 represents a solution for 
the system of boundary conditions according to equation 2. This adaptation is normally made 
with the aid of known training algorithms. An example for such a training algorithm is the 
so-termed generalized iterative scaling GIS algorithm as it is described, for example, in J.N. 
5 Darroch and D. Ratcliff, "Generalized iterative scaling for log linear models". Annals Math. 
Stat., 43(5): 1470-1480, 1972. 

This GIS algorithm provides an iterative calculation of the free parameters X. 
Traditionally, this calculation is made very slowly, however. For expediting ttiis calculation, 
there is proposed in the state of the art to substitute orthogonalized attribute ftinctions 

10 « (h,w) for tiie attribute functions «(h,w) in the system of boundary conditions in 
accordance with equation (2); see for this purpose R. Rosenfeld "A maximum-entropy 
approach to adaptive statistical language modeling"; Computer Speech and Language, 
10:187-228, 1996. Because of the substitution of the attribute fimctions on the left in equation 

2, however, also the boundary values ^« on the right are changed. This changes the original 
1 5 system of boundary conditions i.e. the original training object in the customary sets 
approaches for estimating the boundary values; for this purpose see Rosenfeld at other 
locations, page 205, first sentence of the last-but-one paragraph. 

In this respect it can be established as a disadvantage of the state of the art that 
when the calculation of the GIS algorithm is accelerated, the free parameters X are trained to 
20 a changed training object. The parameters X calculated in this manner are the cause for an 
inadequate adaptation of the speech model to the original training object when the parameter 
X is used in equation 3. 

Starting from this state of the art it is an object of the invention to ftirther 
develop a known computer-supported speech recognition system, a computer-supported 

'iorljio(n) 

25 training system and a known method of iteratively calculating free parameters " of a 
maximum-entropy speech model in the speech recognition system, so that they make a fast 
calculation possible of the free parameters X without a change of the original training object. 

This object is achieved as claimed in patent claun 1 in that with the known 
above-described method of calculatmg the free parameters X according to the GIS algorithm, 

30 any desired orthogonalized boundary value is calculated by linearly combining the 

associated desired boxmdary value with desired boundary values "^^ of attributes p that 
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have a larger range than the attribute a. Here and ^ are desired boundary values of the 
original training object. 

The use of the boundary values ^ " calculated in this manner makes it 
possible in an advantageous manner to make an improved approximation of the free 
parameters X and thus an improvement of the speech model with a view to the original 
training object This qualitative improvement is possible while a high convergence speed 
continues to realize for the free parameters X during the iterative calculation with the aid of 
the GIS algorithm. 

The use of the desired orthogonalized boimdary values calculated 
according to the invention is recommended for several variants of the GIS training algorithm 
as they are described in the dependent claims 12 and 13. 

The object of the invention is furthermore achieved by a speech recognition 
system based on the maximum-entropy speech model MESM as claimed in claim 14 and a 
training system for training the MESM as claimed in claim 1 5. 

By implementing the method according to the invention in the training system, 
compared to the state of the art the MESM in the speech recognition system is adapted more 
effectively to the individual language peculiarities of a certain user of the speech recognition 
system; the quote with which the speech recognition system then correctly recognizes the 
semantic content in the user's speech is improved considerably. 

Otherwise the advantages of this speech recognition system and of the training 
system correspond to liie advantages discussed above for the method. 

The following Figures are added to the description of the invention, in which 

Figs, la and lb describe a method according to the invention of calculating a 

desired orthogonalized boundary value ; 

Figs. 2a and 2b describe a method according to the invention of calculating an 

/'ortho 

orthogonalized attribute function « ; 

Fig. 3 describes a block diagram of a speech recognition system according to 

the invention; 

Fig. 4 describes an attribute tree. 



In the following first a detailed description is given of an example of 
embodiment of the invention while reference is made to Figs, la and lb. 
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Figs. 1 and lb illustrate a method according to the invention of calculating an 

improved desired orthogonalized boundary value « for an attribute a = (30 in a speech 
model. In a first step of the method all the attributes pi with i = 1 ... g that have a so-termed 
larger range than the predefined attribute a = po i.e. which include this at a predefined 
5 position are determined in accordance with this method. Subsequently, in a method step S2 a 
desired boimdary value mpi of the original training object is calculated for all the attributes pi 
with i = 0 ... g, thus also for the attribute a = pO. 

For the calculation of such a desired boundary value mpi, several methods are 
known in the state of the art. 

10 According to a first method the calculation is made in that first a frequency N( 

pi) is determined with which the associated binary attribute function fpi yields the value 1 
when a training corpus of the speech model is used and that, subsequently, the thus 
determined fi-equency value N(pi) is smoothed. 

According to a second, alternative method, the calculation is performed by 

1 5 reducing the quantities of attributes in the speech model until the boundary conditions no 

longer demonstrate conflicts. This sort of reduction in the quantity of attributes must be very 
extensive in practical situations, since otherwise the generated speech model will no longer 
represent a solution to the original training object. According to a third method, the 
calculation is made by using a so-called induced speech model as it is described in J. Peters 

20 and D. KJakow, "Compact Maximum Entropy Language Models", Proc. ASRU, Keystone, 
Colorado, 1999. 

In a method step S3 all the attributes pi are subsequently sorted according to 
their range where an attribute pi that has the largest range is assigned the index i = g. It may 
then certainly happen that individual classes of ranges thus, for example, the class of bigrams 
25 or the class of trigrams are assigned a plurality of attributes pi. In these cases a plurality of 
attributes pi having different, but successive indices i are assigned to one and the same class 
of ranges i.e. these attributes then always have the same RW and belong to the same class of 
ranges. 

For the method to be carried out, in which in the successive steps the 
30 individual attributes pi are evaluated one after the other, it is important for the attributes to be 
processed according to decreasing (or constant) range. In the first run of the method a start is 
therefore made with an attribute pi which is assigned to the highest class of ranges; 
preferably i is set equal to g (see method step S4 and S5 in Fig. la). 
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In a subsequent method step S6 a check is then made whether larger-range 
attributes pk occur with i < k < g for the currently selected attribute pi, which include the 
attribute pi. With the first run the attribute pi with i = g automatically belongs to the class 
that has the largest range, as observed above, and therefore the query in the method step S6 is 
5 to be answered in the negative for this attribute pi. In this case the method jumps to method 
step S8 where a parameter X is set to zero. Then a calculation is made of an improved desired 

^ortho 

orthogonalized boundary value ^ for the attribute pi (with a first run with i = g) in 
accordance with method step S9. As can be seen there, this boimdary value for the attribute p 
i is set equal to the desired boundary value mpi calculated in step S2, if the parameter X = 0 
10 (this is the case, for example, during the first run). 

The method steps S5 to SI 1 are then successively repeated for all the attributes 
pi-1 with i-1 = g-1 ... 0. In the method step SIO the index i is re-initialized, which is 
necessary, and in method step Sll a query is made whether all the attributes pi with i = 0 ... g 
have been processed. 

1 5 For all attributes pi for which there are attributes pk with i < k < g that have a 

larger range, the query in method step S6 must be answered with "Yes". The parameter X is 
then not set to zero but is instead calculated according to method step S7 by totaling the 

corresponding improved desired orthogonalized boundary values ^ each calculated in 
previous run-throughs in method step S9 for the respective attributes pk that have a larger 
20 range. 

Once it has been determined in method step Sll that the desired 

orthogonalized boimdary value has been calculated in method step S9, this is then 

output in method step S12 as '"^ . The method according to the invention extensively 
described just now for the calculation of the improved desired orthogonalized boundary value 

25 may be shortened to the following formula: 



= ma- (') . (4) 



30 



The sum (*) includes all attributes P that have a larger range and contain the 
predefined attribute a. For calculating the boundary value said formula can be used in 
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an almost recursive manner for each attribute P again and again until the sum term disappears 
for certain attributes, that is, for those with the largest range, because there are no more 
attributes that have a larger range for them. The desired orthogonalized boundary values for 
the attributes pk that have the largest range then correspond to tbe respective originally 
5 desired boundary values mpk. 

The implementation of the method according to the invention and as shown in 
Fig. la and lb will be further explained hereinafter while use is made of the following 
training corpus of a speech model used by way of example. The training corpus reads: 

10 "THAT WAS A RED 

OR A GREEN HOUSE 
OR A BLUE HOUSE 
:: THIS IS A WHITE HOUSE AND 

THAT IS THE WHITE HOUSE" 

- 1 5 The training corpus consists of N = 23 individual words. It is assumed that in 

:^ the speech model the desired unigram, bigram and trigram attributes are predefmed according 

to Fig. 4. 

~ Then, by using the normal attribute fimction fa for the training corpus it may 

be established that the unigrams, bigrams and trigrams according to Fig. 4 occur in the 
%J 20 training corpus with the following frequencies: 



25 



Unigrams: 






A 


4 




HOUSE 


4 




IS 


2 




OR 


2 




THAT 


2 




WHITE 


2 




Bierams: 






A 


WHITE 


1 


OR 


A 


2 


WHITE 


HOUSE 


2 
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Trigrams: 

A WHITE HOUSE 1 

In the example shown here the improved desired ortiiogonalized boundary 

value is to be calculated for the attribute a = "HOUSE", For this purpose first 

5 according to method step SI in Fig. la all attributes that have a larger range are to be 

determined for the attribute a. They are according to Fig. 4 the bigram "WHITE HOUSE" 
and the trigram "A WHITE HOUSE". According to method step S2 the normal desired 
boimdary values are to be calculated for these attributes that have a larger range but also for 
the attribute a, for example, in that the respective frequencies established above are 
1 0 smoothed. This smoothing is effected here, for example, by subtracting the value 0, 1 . Thus 
the following normal desired boundary values are the result: 

ma : "HOUSE" =4-0,1= 3,9 

15 mpl : "WHITE HOUSE" =2-0,1= 1,9 

mp2 : "A WHITE HOUSE" =1-0,1= 0,9. 

The attributes a. Pi, P2 are now sorted according to their range and - starting 
20 with the widest ranging attribute - the respective improved desired orthogonalized boimdary 
values are calculated according to formula (6) or according to method step S7-S9 in Figs, la 
and lb: 



25 



(5) 

m;T''^m^,-m;f'' =1,9-0,9 = 1 

Finally, the improved desired orthogonalized boundary value is 
calculated for the attribute a to: 



(7) 



PHDE010032 



01.02.2002 



The orthogonalized boundary value ° calculated according to the 
invention makes a sufficiently accurate calculation possible of the free parameters X and thus 
of the probability according to formula (1) with a view to an original training object while the 
calculation velocity remams the same when used in the GIS training algorithm. 

fyjOrtho 

Hereinafter the use of the boimdary value ° calculated according to the 
invention will be represented for three different variants of the GIS training algorithm. 

With a first variant of the GIS training algorithm the mathematical function G 
has the following form according to equation 1 when the orthogonalized boundary value 

calculated according to the invention is used: 



(8) 



where: 



fOrlho fOrtho 



refers to an iteration parameter; 
refers to a just considered attribute; 
refers to all the attributes in the speech 
model; 

refer to the size of the convergence step; 

desired orthogonalized boundary values in the MESM for the 
attributes a and y; 

refers to iterative approximate values for the desired 

boundary values " , ; and 
ba and by : refer to constants. 

The calculation of the convergence step sizes t and of the iterative approximate 
values for the desired boundary values m is effected - as will be shown hereinafter - by the 
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use of an orthogonalized attribute function -^^ defined according to the invention, which 
reads as follows: 



fortho 
Z^Jp 

= fa- (*) (9) 



It should be observed at this point that the orthogonalized attribute function 

calculated according to the invention in accordance with equation 9 corresponds as 
regards value to the attribute function proposed by Rosenfeld at other locations. However, 
then: calculation according to the mvention is effected totally different as can be seen in Figs. 
10 2a and 2b. The calcidation method is effected analogously to the method described in Figs. 

la and lb for the calculation of the desired orthogonalized boundary values where 
only the symbol for the boundary value m is to be replaced by the symbol for the attribute 
function f and the parameter X by the function F. To avoid repetitions, reference is made here 
to the description of Figs, la and lb for explanations of the method according to Figs. 2a and 
15 2b. 

x'ortho j-ortho 

With the orthogonalized attribute function or thus calculated 

^ortho ^ortho 

according to the invention, the size of the convergence steps « and is calculated in 
equation 8 as follows: 



fOrtho fOrtho 

20 ^« =V -1/'" with ^ 'V/J J (10) 



where Mortho for binary attribute functions represents the maximum number of 
functions which yield the value 1 for the same argument (h,w). 

Furthermore, with the attribute function ^° defined according to the 

ortho(n) 

invention, the iterative approximate value « can be calculated for the desired 
orthogonalized boundary value when the following equation (2) is used: 
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; (11) 



where: 

N(h) refers to the frequency of the history h in the training corpus; and 

refers to an iteration value for the probability p(w | h) with which a 
predefined word w follows a previous word sequence h (history); 

Here ^ ^^' ^ uses the parameter values « 

The use of the improved desired orthogonalized boundary value 
calculated according to the invention is furthermore recommended for a second variant of the 
1 0 GIS training algorithm. Here the attributes of the MESM are subdivided into m groups Ai 

j^ortho 

and for each iteration only the parameters « of the attributes a from one of the groups are 
changed according to the following formula: 



(12) 



n : represents the iteration parameter 

Ai(n) : represents an attribute group Ai(n) with 1 < i < m selected in 

the n* iteration step; 

a : represents a just considered attribute from the just selected 

attribute group Ai(n); 

(3 : represents all attributes from the attribute group Ai(n); 



^.ortho fO' 



25 



represent the size of the convergence step with 

ta =h =ir>w ^th 



PHDE010032 



12 01.02.2002 

''■"^ for binary functions -^-^ represents the maximum number of 
functions from the attribute group Ai(n) which yield the value 1 
for the same argument (h,w); 

^^fi . represent the desired orthogonalized boundary values in the MESM 

for the attributes a and p respectively; 

« , ^ : represents iterative approximate values for the desired boundary 

^ortho yy,ortho 

values . 

jortho 

The group Ai(n) of attributes a whose parameters « are adapted in the 
current iteration step, then cyclically runs through all the m groups in accordance with 
i(n)=n(mod m). 

The use of the desired orthogonalized boimdary value « calculated 

according to the invention is fiirther recommended for a third variant of the GIS training 

algorithm which distinguishes itself from the second variant only in that the attribute group 

Ai(n) to be used for each iteration step is not selected cyclically but according to a predefined 

. . D^") 
cntenon < . 

Fig. 3 fiinally shows a speech recognition system 10 of the type according to 
this invention which is based on the so-termed maximum-entropy speech model. It includes a 
recognition device 12 which attempts to recognize the semantic content of supplied speech 
signals. The speech signals are generally supplied to the speech recognition system in the 
form of output signals from a microphone 20. The recognition device 12 recognizes the 
semantic content of the speech signals by mapping patterns in the received acoustic signal on 
two predefined recognition symbols such as specific words, actions or events, using the 
implemented maximum-entropy speech model MESM. Finally, the recognition device 12 
outputs a signal which represents the semantic content recognized in the speech signal and 
can be used to control all kinds of equipment - for example a word-processing program or 
telephone. 

To make the control of the equipment as error- free as possible in terms of the 
semantic content of speech information used as a control medium, the speech recognition 
system 10 must recognize the semantic content of the speech to be evaluated as correctly as 
possible. To do this, the speech model must be adapted as effectively as possible to the 
linguistic peculiarities of the speaker, i.e. the user of the speech recognition system. This 
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adaptation is performed by a training system 14 which can be operated either externally or 
integrated into the speech recognition system 1 0. To be more accurate, the training system 14 
is used to adapt the MESM in the speech recognition system 10 to recurrent statistical 
patterns in the speech of a particular user. 

Both the recognition device 12 and the training system 14 are normally, 
although not necessarily, in the form of software modules and run on a suitable computer (not 
shown). 



